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CATERPILLAR DUALITIES AND REGULAR LANGUAGES 



PETER L. ERDOS, CLAUDE TARDIF, AND GABOR TARDOS 



Abstract. We characterize obstruction sets in caterpillar dualities in terms 
of regular languages, and give a construction of the dual of a regular family of 
$H ' caterpillars. We show that these duals correspond to the constraint satisfaction 

^ ' problems definable by a monadic linear Datalog program with at most one 

EDB per rule. 



1. Introduction 

\^ \ A homomorphism duality is a couple (O, D) where D is a relational structure 

(-H ' and O is a family relational structures of the same type, such that the following 

holds. 

For any given relational structure A, there exists a homomorphism 
from A to D if and only if there is no homomorphism from any 
member T of O to A. 

Significant dualities typically correspond to efficient algorithms for constraint satis- 
rC , faction problems. These include finite dualities (where the family O is finite), tree 

■<!;;j- ' dualities (where O is a family of trees) and bounded treewidth dualities (where O 

CO , is a family of structures with bounded treewidth). More examples are discussed 

^ in i. 

CO , "Characterizing dualities" may refer to two distinct types of problems. 

• Characterizing targets: Deciding, given a structure D, whether there exists 
a family Od of structures in a given class (e.g. trees) such that (Od, D) is 
a duality. 

• Characterizing obstruction sets: Deciding, given a family O, whether there 
^ \ exists a structure Do such that (O, Do) is a duality. 

H ' The two problems are different. In the case of finite dualities, the characterization 

- - - of obstruction sets was obtained in 2000 ([9:), and that of targets in 2007 (fS:). The 

problem of characterizing targets was solved in 1998 ( 7 ) for tree dualities, and 
recently in 2009 ([!]) for bounded treewidth dualities. Characterizing obstruction 
sets remains an open problem both for tree duality and bounded treewidth duality. 
The difficulty in characterizing obstruction sets may depend on how the obstruc- 
tions are represented. In the case of finite dualities, an explicit description of the 
obstructions is always possible. For infinite families of obstructions, fragments of 
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2 P. L. ERDOS, C. TARDIF, AND G. TARDOS 

the Datalog language have proved to be an efficient tool to describe families of 
obstructions implicitly, through their homomorphic images. The structures with 
tree duality and bounded treewidth duality all have obstruction sets that can be 
described in Datalog. 

In [3], Carvalho, Dalmau and Krokhin introduced caterpillar dualities as the 
dualities (O, D) where O is describable in the smallest natural recursive fragment 
of Datalog, namely "monadic linear Datalog with at most one EDB per rule" (see 
Sectional). They proved that the corresponding targets D are precisely those which 
are homomorphically equivalent to a structure with lattice polymorphisms, and that 
they are recognizable by the existence of a homomorphism of a given superstructure 
C(D) to D (see Section EJ. 

The purpose of the present paper is to complement the work of Carvalho, Dalmau 
and Krokhin by solving the characterization of obstructions problem for caterpillar 
dualities. We will consider a representation of caterpillars by words over a suitable 
alphabet, and show that caterpillar dualities correspond to regular languages. In 
particular, this shows that every program in "monadic linear Datalog with at most 
one EDB per rule" describes the obstruction set of a caterpillar duality. This 
extends some methods developed in [4j to study antichain dualities for digraphs. 
The case of general tree dualities is considered in [5] 

We will provide the necessary background in the next section, and prove our 
main result in Section [3] The link with Datalog is given in Section |4l and relevant 
constructions and extensions are discussed in Section [5] 

2. Preliminaries 

Relational structures. A type is a finite set a = {Ri, . . . , Rm} of relation symbols, 
each with an arity ri assigned to it. A tr-structure is a relational structure A = 
{A; i?i(A), . . . , i?m(A)) where A is a non-empty set called the universe of A, and 
Ri{A) is an r^-ary relation on A for each i. The elements of Ri{A), 1 < i < m 
will be called hyperedges of A. By analogy with the graph theoretic setting, the 
universe of A will also be called its vertex-set, denoted V^(A). 

A (T-structure A may be described by its bipartite incidence multigraph Inc(A) 
defined as follows. The two parts of Inc(A) are V{A) and Block(A), where 

Block(A) = {(i?, (xi, . . . ,Xr)) : R E (T has arity r and (.ti, . . . ,Xr) G i?(A)}, 

and with edges ea,i^B joining a G V{A) to B = (i?, (xi, . . . ,Xr)) G Block(A) when 
Xi = a. Thus, the degree oi B = (_R, (xi, . . . ,Xr)) in Inc(A) is precisely r. Here 
"degree" means number of incident edges rather than number of neighbors because 
parallel edges are possible: If Xi = Xj = a € V^(A), then ea,i,B and ea,j,B both 
join a and B. An element a G V{A) is called a leaf ii it has degree one in Inc(A), 
and a non-/ea/ otherwise. Similarly, a block of A is called pendant if it is incident 
to at most one non-leaf, and non-pendant otherwise. A cr-structure T is called a 
a-tree (or tree for short) if Inc(T) is a (graph-theoretic) tree, that is, it is connected 
and has no cycles or parallel edges. A a-tree is called a path if it has at most two 
pendant blocks. A tr-tree is called a caterpillar if it is either a path or it can be 
turned into a path by removing all its pendant blocks (and the leaves attached to 
them). 

Homomorphisms. For cr-structures A and B, a homomorphism from A to B is a 
map / : V(A) ^ V{B) such that /(i?»(A)) C i?,(B) for alH = 1, . . . , m, where for 
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any relation i? e ct of arity r we have 

![R) = {{f{x^), ..., f{Xr)) ■.{xu...,Xr)€ R} . 

We write A — > B if there exists a homomorphism from A to B, and A 7^ B 
otherwise. We write A o B when A ^ B and B — > A; A and B are then called 
homomorphically equivalent. For a finite structure A, we can always find a structure 
B such that A o B and the cardinality of y(B) is minimal with respect to this 
property. It is well known (see j5]) that any two such structures are isomorphic. 
We then call B the core of A. 

Automata. 

When the type a consists only of binary relations, a cr-structure A is an edge- 
labeled directed graph. If we specify sets /, T C ^ (A) of initial and terminal states 
respectively, we get a nondeterministic automaton (A, /, T). The type a is then 
viewed as an alphabet. A word w d a* naturally corresponds to a directed cr-path 
Pw with \w\ edges with labels successively specified by the letters of w. A walk is a 
homomorphism (j) : P^, -^ A. li (f) maps the first and last vertices of P^ to vertices 
in / and T respectively, then the word w is accepted by (A, /, T). The set of such 
words is called the language accepted by (A,I,T). 

We recall a few basic facts from automata theory. The reader is referred to 
standard references (e.g. [TT]) for a thorough treatment. A language C C a* is 
called regular if it is the language accepted by some nondeterministic automaton. 
It is well known that a language is regular if and only if it can be described by 
a "regular expression" , that is, an expression constructed from letters in a using 
unions, concatenation and the star operation. Regular languages are also preserved 
by other basic operations such as intersection and complementation. 

An automaton (A, /, T) is called deterministic if / is a singleton and for every 
a e ^(A) and R G a, there is a unique b G V{A) such that (a, 6) e R{A). It 
is well known that for every non-deterministic automaton (A,/,r), there exists a 
deterministic automaton A(A,/, T) which accepts the same language. 

3. Caterpillars 

Graph-theoretic caterpillars consist of a path "body" to which are connected a 
number of pendant "leg" edges. Similarly, the non- leaves of a general caterpillar 
(with at least two blocks) can be linearly ordered xi, . . . , a;„ such that Xi, Xi+i are 
incident to one common non-pendant block Bi for i — 1, . . . ,n — 1. The remaining 
blocks of T are pendant, and each of them is incident to one of xi, . . . , x„. In this 
section we present a way to represent caterpillars by words over a suitable alphabet. 



R'Ta 




V{T) = {a,...,k} 

a = {i?, 5, P} with arities 4, 3, 2. 

R= {(a,6,c,d)}; 

S = {{e,c,f),{c,g,h),{i,h,j)}; 



Figure 1. The caterpillar T 
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non-pendant blocks non-leaf elements 



^.c,3,R{a,b,c,d) 




R{a,b,c,d) S(e,c,f) S{c,g,h) S{i,hJ) P{j,k) 





Figure 2. The bipartite graph Inc(T) 

Given a type a, we define a2 as follows; For every _R G cr of arity k and for every 
(i,j) G {1, • • ■ ,fc}^, cr2 contains the symbol R^^-'^\ Thus (72 can be viewed as an 
alphabet or as a type consisting of binary relations. 

As an alphabet, (72 allows to represent cr-catcrpillars in a natural way: If T is a 
cr-caterpillar and xi, . . . ,x„ are its non-leaves with their natural ordering, then T 
corresponds to the cr2-word 

XiLiX2L2X'g, ■ ■ ■ Xn^iLn-iXm 

where Xi is the concatenation of all R'-^'^^s such that T has a pendant block 
{R,{ai, . . . ,ak)) with aj = Xi, and Li is R^^'''^ such that T has a non-pendant 
block (i?, (ai, . . . ,ai)) with aj = Xi, a^ = Xi-^-i. A caterpillar consisting of a single 
block {R, (ai, . . . ,ak)) can be represented by any letter of the letters i?'^*'^-' , and the 
caterpillar consisting of one vertex and no blocks is represented by the empty word. 
In general, different words may represent the same caterpillar. However a caterpil- 
lar may be retrieved from any word representing it. This retrieval is essentially a 
functor from the category of cr2-structures (where (72 is interpreted as a type) to 
that of a structures, as detailed below. 



_2 _ r pll nl4 d21 d44 

(7 — •jrt ,...,-K , li , . . . , It , 

ell 012 cl3 021 c33 

J , o , o ,o , . . . , o , 

pll pl2 p21 p22} 



i?33^22^13^23pll 
^12^33^13^21pl2 



(A) 



(B) 



Figure 3. (a) The alphabet a'^{T) (b) two of the 28 words de- 
scribing Inc(T) 



There is a natural functor /3 which takes a cr-structure A and produces a cor- 
responding cr2-structure /3(A): We put V{f3{A)) — V{A), and for i? G cr and 
{xi,...,xk) ei?(A), weput {x^,Xj) gE(*'^)(^(A)) for all (i, j) e{l,...,fc}2. The 
functor /? is a right adjoint in the sense of [3 [10] , thus there exists a corresponding 
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left adjoint (3* such that for a cr2-structurc A and a cr-structure B, we have 

The cr-structure /3*(A) is constructed as follows. We first construct an auxiliary 
structure /3*(A)+. For each element x G V{A.), F(/3*(A)+) contains a corre- 
sponding (isolated) element x' , and for each {x,y) £ i?*^*'^'(A), y(/3*(A)+) con- 
tains additional elements xi, . . . ,Xk (where R Cz a has arity k) and the hyperedge 
{xi,...,Xk) e R{f3*{A)+). I3*{A) is then the quotient (/3*(A)+)/ - obtained 
through natural identifications. That is, for (x, y) G R^^'^' (A) and the correspond- 
ing (xi, . . . ,Xfe) G i?(/?*(A)+), ^ identifies Xi with x' and Xj with y' . 

Note that the construction of /3*(A) — (/3*(A)+)/ ~ may identify elements 
x', y' that correspond to distinct elements of x, j/ G V{A). This happens when for 
x,y £ ^(A), X ^ y, there is some i?'*'*^ G cr2 such that (x, y) G i?*^*'*-*. In particular, 
a cr-caterpillar T is described by a (T2-word w, with letters of type i?*^*'*) describing 
its legs. In turn, w naturally corresponds to the (T2-path P^, with \w\ -t- 1 elements 
successively joined by the relations indicated by the letters of w. We then have 
/3*(P) ~ T. The adjunction property between /? and 13* implies the following. 

Lemma 3.1. Let a be a type and A a a-structure. Then the family of a2 -words 
describing the caterpillars that admit homomorphisms to A is a regular language. 

Proof. Let T be a cr-caterpillar, w a word describing it and P^„ the cr2-path corre- 
sponding to w. Then the adjunction property yields 

P^^l3{A)^f3*(P^)^A. 

with /3*{Pw) — T. Since /3(A) can be viewed as a nondeterministic automaton 
with all states being initial and terminal, this shows that the corresponding words 
w indeed constitute a regular language. D 

Since the complement of a regular language is again regular, the family of cater- 
pillar obstructions of any cr-structure A is again described by a regular language. 

Theorem 3.2. Let a be a type, C a regular language over 02 and O the family of 
a -caterpillars represented by C. Then there exists a a-structure A such that (O, A) 
is a homomorphism duality. 

Proof. Let (D,/, T) be a deterministic automaton which recognizes C We define 
the structure A = r(D,/, T) as follows. V{A) is the set of subsets of ^(D) 
containing the initial state but none of the terminal states. For a relation i? G cr 
of arity k, R{A) is defined as follows: We put {Xi, . . . ,Xk) G -R(A) if for all 
{i,j) & {1, . . . , fc}^ and for all a G Xi, the unique b such that (a, b) G i?''*'^''(D) is 
inXj. 

Let B be a structure such that no caterpillar described by C admits a homo- 
morphism to B. Let w he a word over CT2 such that there exists a homomorphism 
<j> : /3*(Pw) —>■ B. Then, cj) induces a homomorphism (j>2 : P«, — > /^(B), and we 
denote b^,^ the image of the last vertex of P^ under (f>2. Also, there is a unique 
homomorphism of P^ to D mapping the first element to the start state, and we de- 
note dw the image of the last vertex of P^ • Using every possible w and : T — > B 
we define a map tp : V^(B) — > P{D) as follows. For an element b of B, tp{b) is the 
set of all elements d^ such that b = bw,^,. Then tp{b) always contains the start state 
(because the empty word represents the one-element caterpillar with no hyperedges, 
which can be mapped to b) and never a terminal state (because d^ can never be a 



6 P. L. ERDOS, C. TARDIF, AND G. TARDOS 

terminal state). Thus -0 is a map from V^(B) to V^(A). We prove that it is a homo- 
morphism of B to A. Let i? be a relation in a of arity /c, and (61, ... , h-k) G ^(B). 
For (i, j) S {1, . . . , fc}^ and d G ^p{bi), there exists a word w such that dyj = d and 
there exists a homomorphism : /3*(P^) — >■ B such that bw.4, — bi. By appending 
i?^*'-'^ to w, we get a new word w' such that : /3*(P^) — > B naturally extends to 
(j)' : /3*(Pu,') — >■ B, with bw,4>' = bj. Therefore the unique element dw' such that 
{dw,dw') G i?'*^-'-' is in ^{bj). This shows that f/; is a homomorphism. 

Therefore, if no caterpillar described by C admits a homomorphism to B, then 
B admits a homomorphism to A. It remains to prove that no caterpillar described 
by C admits a homomorphism to A. For w G C, suppose that there exists a 
homomorphism (f) : /3*(P^') — > A. This corresponds to a homomorphism (f)2 : 
P„ -^ /3(A). Since the start state is in the image of the first element of P„, a 
terminal is in the image of its last element, which is impossible. D 

According to Theorem 13.21 for every regular (T2-language C, there exists a du- 
ality (O, A) such that O is the family of caterpillars described by £. However C 
may be smaller than the set £"'" of all words describing caterpillar obstructions to 
A.c; however by Lemma |3.1[ £"*" is also regular (since its complement is regular). 
Between C and £+ there are usually non-regular languages which also describe 
complete set of obstructions to A. There may even be such non-regular languages 
that do not contain C. Therefore, the complete characterization of obstruction sets 
for caterpillar dualities may be stated as follows: 

Theorem 3.3. Let C be a (j2-language, O the family of a -caterpillars described by 
C, 0~^ the family of a -caterpillars which contain homomorphic images of members 
of O and C^ the collection of words describing these caterpillars. Then there exists 
a duality (O, A) if and only if £+ is regular. 

4. Caterpillar Datalog programs 

A caterpillar Datalog program is a "monadic linear Datalog program with at 
most one EDB per rule" , that is, a set of rules of the form 

(1) a (z Pi -^ b E pj and (xi, . . . , Xk) G R with Xm = a, x„ = b. 

Here i? is a relation in a type a of arity k (called an extensional database or EDB), 
and Pi , Pj are unary auxiliary relations that are not in a and that will be defined 
recursively (they are called intensional databases or IDBs) . The auxiliary relations 
are monadic, that is, unary, and the program is "linear" since at most one auxiliary 
relation is used in the condition on the right side of the arrow. (See [7^ for a 
description of general Datalog programs.) In addition, the first rule is a formal 
initialization: 

(2) a E pi ^ a — a, 
and there are terminal rules of the form 

(3) goal <^ a E Pi. 

A Datalog program is usually seen as a way to construct unary relations pi, p2, . . ■ 
in a CT-structure B recursively, by a repeated application of the rules that apply, 
until a certain "goal" is achieved. Note that all the rules can be rewritten in terms 
of the type a2 '■ The rule [1] can be written 

(4) aepi^be Pj and (6, a) G i?^"'™). 
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In this modified form, the program can be executed in /3(B). We see that the "goal" 
is achieved when a certain (T2-walk is found in /3(B), which corresponds to finding 
a homomorphic image of the corresponding caterpillar in B. 

Therefore, a caterpillar Datalog program will achieve its goal on the structures 
which contain homomorphic images of caterpillars belonging to a certain family. To 
see that this family is regular, we consider the nondeterministic automaton (C, /, T) 
of type (72 described by the rules of the programs: V{C) is the set of IDE's of the 
program, and for each rule 

a e Pi ^ b e pj and (b, a) G i?("'™) 

we put {pj,Pi) & i?("''"^(C). We put / — {pi}, and the terminal states are the states 
Pi appearing in terminal rules. Thus a goal-achieving derivation in a structure B 
must correspond to a word accepted by (C, /, T), and the family of such words is 
regular. Combining this with Theorem 13.21 we get the following. 



Theorem 4.1. For every caterpillar Datalog program, there exists a structure A 
such that an input structure B admits a homomorphism to A if and only if the 
program does not achieve its goal on B. 

5. Construction and characterization of duals 

For a type a, a regular cr2-language C may be described by a regular expression, 
an automaton (deterministic or nondeterministic) which recognizes it or a cater- 
pillar Datalog program. The previous section explains how to convert a caterpillar 
Datalog program into a nondeterministic automaton which recognizes the same 
language. We refer to |Tl] for the conversion from regular expression to automaton, 
and for the construction A which takes a nondeterministic automaton (B,/,r) 
and constructs a deterministic automaton (D,/',T') = A(B,/, T) which accepts 
the same language. Thus, if the regular (T2-language C is recognized by the au- 
tomaton (B,/, T), then the corresponding caterpillar duality is (O, A), where O is 
the family of cr-caterpillars described by C and A = F o A(B,/, T), F being the 
construction described in the proof of Theorem 13.21 

Now for any cr-structure A, (/3(A), V{A), V{A)) is a nondeterministic automaton 
which recognizes the cr2-language of words describing caterpillars which admit a ho- 
momorphism to A, and A{/3{A),V{A),V{A)) is a deterministic automaton which 
serves the same purpose. Let A*(/3(A), V{A), V{A)) be the deterministic automa- 
ton obtained from A (/3(A), V^(A), V"(A)) by interchanging the set of terminal states 
with its complement. Then A*{f3{A),V{A),V{A)) is a deterministic automaton 
which recognizes the cr2-la'nguage of words describing the set O of caterpillars which 
do not admit a homomorphism to A, and {0,T o A*{j3{A),V{A),V{A))) is the 
corresponding caterpillar duality, which has the following properties. 

Theorem 5.1. C(A) ~ To A*{(3{A),V{A),V{A)) has caterpillar duality, and for 
any a-structure B with caterpillar duality, there exists a homomorphism of A to B 
if and only if there exists a homomorphism of C(A) to B. In particular, A itself 
has caterpillar duality if and only if there exists a homomorphism of C(A) to A. 

This is essentially the characterization obtained in [3]. Note that A* and F are 
both exponential constructions, so that C is a doubly exponential construction. 

With a slight modification, the same type of characterization also holds for cater- 
pillar dualities with additional properties. The most distinctive case is that of path 



8 P. L. ERDOS, C. TARDIF, AND G. TARDOS 

dualities, where the obstructions are described by words not containing any of the 
symbols i?^*'*) such that R G a has arity at least 2. For a a-structurc A, let Ca be 
the language describing the caterpillar obstructions to A, and Cp C (jg be the set 
of words not containing any of the symbols i?^*'*) such that R E a has arity at least 
2. Then Cp and Cp n Ca. are regular languages, hence with the construction F we 
can build a structure Cp(A) such that A has path duality if and only if there exists 
a homomorphism of Cp(A) to A. A similar statement holds for any intersection 
C n Ca , where /^ C 0-2 is a regular language. 
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